Cartwright, Gonzalez, & Piro 



O 



Q 
U 



X 



Pitch perception: A dynamical-systems perspective 

Julyan H. E. Cartwright*, Diego L. Gonzalezj, & Oreste Piroj: 

* Laboratorio de Estudios Cristalogrdficos, CSIC, E-18071 Granada, Spain. 



E-mail iulyan@lec.ugr.es, Web http://lec.ugr.es/^iulyan 
t Istituto Lamel, CNR, 1-40129 Bologna, Italy. 
• E-mail gonzalez@lamel.bo.cnr.it 



o 
o 

' E-mail piro@imedea.uib.es, Web http://www.imedea.uib.es/^pirc 



J Institut Mediterrani d'Estudis Avangats, CSIC~UIB, E-07071 Palma de Mallorca, Spain. 



Published in: Proc. Nat. Acad. Sci. USA 98, 4855-4859, 2001. 



' Two and a half millennia ago Pythagoras initiated the scientific study of the pitch of sounds; 

yet our understanding of the mechanisms of pitch perception remains incomplete. Physical 



models of pitch perception try to explain from elementary principles why certain physical char- 
acteristics of the stimulus lead to particular pitch sensations. There are two broad categories 
of pitch-perception models: place or spectral models consider that pitch is mainly related to 
C I the Fourier spectrum of the stimulus, whereas for periodicity or temporal models its charac- 

teristics in the time domain are more important. Current models from either class are usually 
computationally intensive, implementing a series of steps more or less supported by auditory 
physiology. However, the brain has to analyse and react in real time to an enormous amount 
^ ' of information from the ear and other senses. How is all this information efficiently repre- 

■ sented and processed in the nervous system? A proposal of nonlinear and complex systems 
, research is that dynamical attractors may form the basis of neural information processing. Be- 
I cause the auditory system is a complex and highly nonlinear dynamical system it is natural 

^ ■ to suppose that dynamical attractors may carry perceptual and functional meaning. Here we 

^ . show that this idea, scarcely developed in current pitch models, can be successfully applied to 

I pitch perception. 

The pitch of a sound is where we perceive it to lie on a musical scale. For a pure tone 

■ with a single frequency component, pitch rises monotonically with frequency. However, 
^ I more complex signals also elicit a pitch sensation. Some instances are presented in Fig. ^. 

These are sounds produced by the nonlinear interaction of two or more periodic sources, 
by amplitude or frequency modulation. All such stimuli, which may be termed complex 
i tones, produce a definite pitch sensation, and all of them exhibit a certain spectral period- 

icity. Many natural sounds have this quality, including vowel sounds in human speech 
and vocalizations of many other animals. Evidence for the importance of spectral peri- 
odicity in sound processing by humans is that noisy stimuli exhibiting this property also 
elicit a pitch sensation. An example is repetition pitch: the pitch of ripple noise&l, which 
arises naturally when the sound from a noisy source interacts with a delayed version of 
itself, produced, for example, by a single or multiple echo. It is clear that an efficient 
mechanism for the analysis and recognition of complex tones represents an evolutionary 
advantage for an organism. In this light, the pitch percept may be seen as aua^ffective 
one-parameter categorization of sounds possessing some spectral periodicityEWEl. 

Virtual Pitch 

For a harmonic stimulus like Fig. (a periodic signal), there is a natural physical so- 
lution to the problem of encoding it with a single parameter: take the fundamental com- 
ponent of the stimulus as the pitch and all other components are naturally recorded as the 
higher harmonics of the fundamental. This is what nature does. However, a harmonic 
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Figure 1: Stimuli: waveforms, Fourier spectra, and pitches, (a) 1 kHz pure tone; the 
pitch coincides with the frequency ujq. (b) Complex tone formed by 200 Hz fundamental 
plus overtones; the pitch is at the frequency of the fundamental wq. (c) After high-pass 
filtering of the previous tone to remove the fundamental and the first few overtones, the 
pitch ooq remains at the frequency of the missing fundamental (dotted), (d) The result of 
frequency modulation of a 1 kHz pure tone carrier by a 200 Hz pure tone modulant. (e) 
Complex tone produced by amplitude modulation of a 1 kHz pure tone carrier by a 200 
Hz pure tone modulant; the pitch coincides with the difference combination tone ujq. (f) 
Result of shifting the partials of the previous tone in frequency by Alo = 90 Hz; the pitch 
shifts by Acjq ~ 20 Hz, although the difference combination tone does not. (g) Schematic 
diagram of the frequency line details (above the line) the pitch shift behaviour of (f) and 
(below the line) the three-frequency resonance we propose to explain it. 
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stimulus like Fig. which is high-pass filtered so that the fundamental and some of the 
first higher harmonics are eliminated, nevertheless maintains its pitch at the frequency 
of the absent fundamental. The stimulus (Fig. ^) obtained by amplitude modulation of 
a sinusoidal carrier of 1 kHz by a sinusoidal modulant of 200 Hz is also of this type. As 
the carrier and modulant are rationally related, the stimulus is harmonic; the partials are 
integer multiples of the absent fundamental uq = 200 Hz. The perception of pitch for this 
kind of stimulus is known as the problem of the missing fundamental, virtual pitch, or 
residue perception!. The first physical theory for the phenomenon was proposed by von 
HelmholtzQ, who attributed it to the generation of difference combination tones in the 
nonlinearities of the ear. A passive nonlinearity fed by two sources with frequencies ui 
and u!2 generates combination tones of frequency loc (see the Appendix for clarification 
of the concepts from nonlinear dynamics used throughout this paper). For a harmonic 
complex tone, such as Fig. ||e, the difference combination tone ojc = ^^2— "^i between two 
successive partials has the frequency of the missing fundamental ujq. In a crucial experi- 
ment, however, Schouten et al.i demonstrated that the residue cannot be described by a 
difference combination tone: if we shift all the partials in frequency by the same amount 
Auj (Fig. the difference combination tone remains unchanged. But the perceived pitch 
shifts, with a linear dependence on Alo. 

A Dynamical-Systems Perspective 

Such a complex tone is no longer harmonic. How does nature encode an inharmonic 
complex tone into a single pitch? Intuitively, the shifted pseudofundamental depicted in 
Fig. Hg might seem to be a better choice than the unshifted fundamental, which corre- 
sponds to the difference combination tone. However, from a mathematical point of view, 
this is not obvious. The ratios between successive partials of the shifted stimulus are ir- 
rational and we cannot represent them as higher harmonics of a nonzero fundamental 
frequency; the true fundamental would have frequency zero. Some kind of approxima- 
tion is needed. The approximation of two arbitrary frequencies, loi and uj2, by the har- 
monics of a third, lor, is equivalent to the mathematical problem of finding a strongly 
convergent sequence of pairs of rational numbers with the same denominator that simul- 
taneously approximates the two frequency ratios, uji/ojr and oj2/^r- If we consider the 
approximation to only one frequency ratio there exists a general solution given by the 
continued-fraction algorithmQ. However, for two frequency ratios a general solution is 
not known. Some algorithms have been proposed that work for particular values of the 
frequency ratios or that are weakly convergentll3. We developed an alternative approach 
mi. The idea is to equate the distances between appropriate harmonics of the pseudo- 
fundamental and the pair of frequencies we wish to approximate. In this way the two 
approximations are equally good or bad. The problem can then be solved by a general- 
ization of the Farey sum. This approach enables the hierarchical classification of a type 
of dynamical attractors found in systems with three frequencies: three-frequency reso- 
nances \p,q,r]. 

A classification of three-frequency resonances allows us to propose how nature might 
encode an inharmonic complex tone into a single pitch percept. The pitch of a complex 
tone corresponds to a one-parameter categorization of sounds by a physical frequency 
whose harmonics are good approximations to the partials of the complex. This physical 
frequency is naturally generated as a universal response of a nonlinear dynamical sys- 
tem — the auditory system, or some specialized subsystem of it — under the action of an 
external force, namely the stimulus. Psychophysical experiments with multicomponent 
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stimuli suggest that the lowest-frequency components are usually dominant in determin- 
ing residue perceptions. Thus we represent the external force as a first approximation by 
the two lowest-frequency components of the stimulus. For pitch shift experiments with 
small frequency detuning Ao;, such as those of Schouten et al., the vicinity of these two 
lowest components uji = kujQ + Alj and uj2 = {k + l)uJo + Auj to successive multiples 
of some missing fundamental ensures that {k + l)/k is a good rational approximation to 
their frequency ratio. Hence we concentrate on a small interval between the frequencies 
uJi/k and i02/{k + 1) around the missing fundamental of the nonshifted case. These fre- 
quencies correspond to the three-frequency resonances [0, —1, k] and [—1, 0,k + 1]. We 
suppose that the residue should be associated with the largest three-frequency resonance 
in this interval: the daughter of these resonances, [— 1, — 1, 2A; + 1]. If this reasoning is 
correct, the three-frequency resonance formed between the two lowest-frequency com- 
ponents of the complex tone and the response frequency P = {uji + U2)/{2k + 1) gives 
rise to the perceived residue pitch P. 

Results 

As we showed in earlier work@, there is good agreement betwen the pitch perceived 
in experiments and the three-frequency resonance produced by the two lowest-frequency 
components of the complex tone for intermediate harmonic numbers 3 < A; < 8. For high 
and low k values there are systematic deviations from these predictions. Such deviations, 
noted in pitch-perception modelling, are explained by the dominance effect: there is a 
frequency window of preferred stimulus components, so that not all components are 
equally important in determining residue perceptiont^. In order to describe these slope 
deviations for high and low k values within our approach, we must, instead of taking 
the lowest-frequency components, use some effective k that depends on the dominance 
effect. In this, we also take into account the presence of difference combination tones, 
which provide some components with ks not present in the original stimulus. In Fig. ^ we 
have superimposed the predicted three-frequency resonances, including the dominance 
effect, on published experimental pitch-shift dataoMlj. For stimuli consisting only of 
high-A; components, the window of the dominance region is almost empty, and difference 
combination tones of lower k can become more important than the primary components 
in determining the pitch of the stimulus. The result of this modification is a saturation of 
the slopes that correctly describes the experimental data. A saturation of slopes can also 
be seen in the experimental data for low values of k. This effect too can be explained in 
terms of the dominance region. For a 200 Hz stimulus spacing, the region is situated at 
about 800 Hz; this implies that stimulus components with harmonic numbers n and n + l 
other than the two lowest ones (i.e., n > k) become more important for determining the 
the three-frequency resonance that provides the residue pitch. Again, incorporating this 
modification, we can correctly predict the experimental data. 

But for the more complex case of low-A; stimuli, not only quantitative, but also quali- 
tative, differences arise between the two-lowest-component theory and experiment. The 
most interesting feature seen in the data of Fig. ^ is a second series of pitch-shift lines 
clustered around the pitch of 100 Hz. This too can be explained within the framework of 
our ideas. Recall that for small frequency detuning, the frequency ratio between adjacent 
stimulus components, Aa;, can be approximated by the quotient of two integers differ- 
ing by unity: uj^I^x = (n + l)/n. However, if we relax the small detuning constraint, so 
that Aoj becomes large, we can move to a case where UJ2/0J1 can better be approximated by 
(n+2) / (n+ 1) . But, by the usual Farey sum operation between rational numbers, we know 
that there exists between these two regions an interval in which the frequency ratio can be 
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LOWER FREQUENCY OF SHIETCD STIMULUS 



Figure 2: Experimental data (red dots) from Gerson & Goldsteintil (0-800 Hz range) and 
from Schouten et al.i (1200-2200 Hz range) show pitch as a function of the lower fre- 
quency / = kujQ + Auj of a complex tone {kujQ + Auj, {k + \)ujq + Auj, {k + 2)ul)q + Auj, . . .} 
with the partials spaced g = luq = 200 Hz apart. The data of Schouten et al. are for 
three-component tones monotically presented (all of the stimulus entering one ear), and 
those of Gerson & Goldstein for four-component tones dichotically presented (part of 
the stimulus entering one ear and the rest of the stimulus the other, controlateral, ear); 
the harmonic numbers of the partials present in the stimuli are shown beside the data. 
The pitch-shift effect we predict from three-frequency resonance, taking into account the 
dominance region, is shown superimposed on the data as solid lines given by the equa- 
tions P = g + {f -ng)/{n + 1/2) (primary lines) P = g/2 + {f - {n + l/2)g)/{2n + 2) 
(secondary lines), and P = g/^ + {f — (n — 1/ ^)g) / {An + 1) (tertiary line); the harmonic 
numbers of the partials used to calculate the pitch-shift lines are shown enclosed in red 
squares. For primary lines these harmonic numbers correspond to n and n + 1, for sec- 
ondary lines to 2n + 1 and 2n + 3, and for the tertiary line to 4n + 1 and 4n + 5. A red 
circle, instead of a square, signifies that the component is not physically present in the 
stimulus, but corresponds to a combination tone. The inset at bottom right corresponds 
to the slopes of the data averaged over the distinct experimental values plotted as a func- 
tion of harmonic number. The blue squares are the data of Gerson & Goldsteiii, the red 
squares are those of Schouten, and lastly, the blue circles are data of Pattersontj for six 
and twelve-component tones which are averaged over different experimental situations 
that represent several thousand points. The black diamonds correspond to our theory 
and show that the data of Gerson & Goldstein and those of Patterson saturate for differ- 
ent values of k (the experimental conditions were different). 
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better approximated by (2n + 3)/ (2n + 1). In this interval, then, the main three-frequency 
resonance is [—1, —l,An + 4], giving a response frequency P = {toi + L02) / (4n + 4), which 
produces a pitch-shift line with slope l/(2n + 2) aroimd ^0/2 = 100 Hz for the case 
analysed. Of course, if prefiltering produces a saturation of the slopes of the primary 
pitch-shift lines, the same should occur for these secondary ones. In Fig. ^ we show our 
predictions for the secondary lines taking in account the dominance effect. The agree- 
ment, both qualitative and also quantitative, is impressive. Moreover, a small group of 
data points indicates the existence of a further level of pitch-shift lines clustered around 
50 Hz in a region between a primary and a secondary pitch-shift line. We can understand 
this level in the same way as above, and we plot our prediction for its pitch-shift line 
in Fig. 1^ This hierarchical arrangement of the perception of pitch of complex tones is 
entirely consistent with the universal devil's staircase structure that dynamical systems 
theory predicts for the three-frequency resonances in quasiperiodically forced dynami- 
cal systems. Further evidence comes from psychophysical experiments with pure tones. 
These, presented under particular experimental conditions, also elicit a residue sensa- 
tion. The extremes of the three-frequency staircase correspond to subharmonics of only 
one external frequency, and thus these are the expectedresponses when only one stim- 
ulus component is present. As the results of Houtgastlla show, these subharmonics are 
indeed perceived. 

Discussion 

A dynamical attractor can be studied by means of time or frequency analysis. Both 
are common techniques in dynamical-systems analysis, but one is not inherently more 
fundamental than the other, nor are these the only two tools available. For this reason, 
and because our reasoning makes no use of a particular physiological implementation, 
our results cannot be included directly either in the spectralli3 or the temporal classes of 
models of pitch perception. What we have proposed is not a model, but a mathematical 
basis for the perception of pitch that uses the universality of responses of dynamical sys- 
tems to address the question of why the auditory system should behave as it does when 
confronted by stimuli consisting of complex tones. Not all pitch perception phenomena 
are explicable in terms of universality; nor should they be, since some will depend on the 
specific details of the neural circuitry. However, this is a powerful way of approaching the 
problem that is capable of explaining many experimental data considered difficult to un- 
derstand. Futurepitch models can surely incorporate these results in their frameworks. 
Spectral models tZI can use these ideas since they make consistent use of different kinds 
of harmonic templates, and three-frequency resonances offer in a natural way optimized 
candidates for the base frequency of such templates without the need to include stochas- 
tic terms. Temporal modelsEJ can apply these results as they need some kind of locking 
of neural spiking to the fine structure of the stimulus, and three-frequency resonances 
are the natural extension of phase locking to the more complicated case of quasiperiodic 
forcing that is typically related to the perception of complex tones. A dynamical-systems 
viewpoint can then integrate spectral and temporal hypotheses into a coherent unified 
approach to pitch perception incorporating both sets of ideas. 

We have shown that universal properties of dynamical responses in nonlinear sys- 
tems are reflected in the pitch perception of complex tones. In previous work0, we ar- 
gued that a dynamical-systems approach backs up experimental evidence for subcortical 
pitch processing in humansEj. The experimental evidence is not conclusive, as studies 
with monkfivs have found that raw spectral information is present in the primary audi- 
tory cortexEy. However, whether this processing occurs in, or before, the auditory cortex. 
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the dynamical mechanism we envisage greatly facilitates processing of information into 
a single percept. Pitch processing may then prove to be an example in which universality 
in nonlinear dynamics can help to explain complex experimental results in biology. The 
auditory system possesses an astonishing capability for processing pitch-related infor- 
mation in real time; what we have demonstrated here is how, at a fundamental level, this 
can be so. 
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Appendix: Universality in Nonlinear Systems 



Nonlinear systems exhibit universal responses under external forcing: 



Harmonics from periodically forced passive nonlinearities 



COi 



passive 
nonlinearity 



Wi 2(Oi3o)i 



A single frequency periodically forcing a passive (sometimes termed static) nonlinearity 
generates higher harmonics (overtones) 2uJi , 3loi , ... of a fundamental uJi, given by puji + 
ujh = with p integer. This is seen in acoustics as harmonic distortion. 



Combination tones from quasiperiodically forced passive nonlinearities 
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A passive nonlinearity forced quasiperiodically by two sources generates combination 
tones uji — uj2,ij->i + ^2, ■ ■ ■, which are solutions of the equation puJi + quj2 +ujc = where 
p and q are integers. They are found as distortion products in acoustics. 



Subharmonics, or two-frequency resonances from periodically forced dynamical sys- 
tems 
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With a periodically forced active nonlinearity — a dynamical system — more complex 
subharmonic responses coi/r, 2uji/r, . . . , (r — 1)101 /r known as mode lockings or two- 
frequency resonances are generated. These are given by pui + ru2R = when p and 
r are integers. As some parameter is varied, different resonances are foimd that remain 
stable over an interval. A classical representation of this, known as the devil's staircase, 
is shown in Fig. ^. 

We see that the resonances are hierarchically arranged. The local ordering can be de- 
scribed by the Farey sum: If two rational numbers a/c and b/d satisfy \ad — bc\ = 1 we say 
that they are unimodular or adjacents and we can find between them a unique rational 
with minimal denominator. This rational is called the mediant and can be expressed as 
a Farey sum operation a/c ® b/d = {a + b)/{c + d). The resonance characterized by the 
mediant is the widest between those represented by the adjacentsE^. 
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Figure 3: Two-frequency devil's staircase. The rotation number, the frequency ratio p = 
—p/r = uj2rI^\, is plotted against the period of the external force. 

Three-frequency resonances from quasiperiodically forced dynamical systems 



1 1 


dynamical 


j 1 II 


system 











Quasiperiodically forced dynamical systems show a great variety of qualitative responses 
that fall into three main categories: there are periodic attractors, quasiperiodic attrac- 
tors, and chaotic and nonchaotic strange attractors. Here we concentrate on the three- 
frequency resonances produced by two-frequency quasiperiodic attractors as the natural 
candidates for modelling the residue §1. Three-frequency resonances are given by the non- 
trivial solutions of the equation pwi + quj)2 + rws/j = 0, where p, q, and r are integers, lo\ 
and <jj2 are the forcing frequencies, and t^s/j is the resonant response, and can be written 
compactly in the form [p, g, r]. Combination tones are three-frequency resonances of the 
restricted class [p, g, 1] . This is the only type of response possible from a passive nonlinear- 
ity, whereas a dynamical system such as a forced oscillator is an active nonlinearity with 
at least one intrinsic frequency, and can exhibit the full panoply of three-frequency reso- 
nances, which include subharmonics of combination tones. Three-frequency resonances 
obey hierarchical ordering properties very similar to those governing two-frequency res- 
onances in periodically forced systems. In the interval {uj2/p,^i/(l), we may define a 
generalized Farey sum between any pair of adjacents as oi/c © 02/(i = (ai + a2)/{c + d). 
The daughter three-frequency resonance characterized by the generalized mediant is the 
widest between its parents characterized by the adjacents Thus three-frequency res- 
onances are ordered very similarly to their counterparts in two-frequency systems, and 
form their own devil's staircase; Fig. ^. 
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Figure 4: Three-frequency devil's staircase. Contrarily to the case of periodically driven 
systems, where plateaux represent periodic solutions, here they represent quasiperiodic 
solutions (only the third frequency is represented in the ordinate). We have investigated 
these properties in three different systems: the quasiperiodic circle map, a system of cou- 
pled electronic oscillators and a set of ordinary nonlinear differentialequations, with the 
same qualitative resultsES, which confirm the theoretical predictions 



